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Abstract. One of the most promising applications of mathematical 
knowledge management is search: Even if we restrict attention to the 
tiny fragment of mathematics that has been formalized, the amount ex- 
ceeds the comprehension of an individual human. 

Based on the generic representation language MMT, we introduce the 
mathematical query language QMT: It combines simplicity, expressiv- 
ity, and scalability while avoiding a commitment to a particular logical 
formalism. QMT can integrate various search paradigms such as unifica- 
tion, semantic web, or XQuery style queries, and QMT queries can span 
different mathematical libraries. 

We have implemented QMT as a part of the MMT API. This combina- 
tion provides a scalable indexing and query engine that can be readily 
applied to any library of mathematical knowledge. While our focus here 
is on libraries that are available in a content markup language, QMT 
naturally extends to presentation and narration markup languages. 



1 Introduction and Related Work 

Mathematical knowledge management applications are particularly strong at 
large scales, where automation can be significantly superior to human intuition. 
This makes search and retrieval pivotal MKM applications: The more the amount 
of mathematical knowledge grows, the harder it becomes for users to find relevant 
information. Indeed, even expert users of individual libraries can have difficulties 
reusing an existing development because they are not aware of it. Therefore, this 
question has received much attention. 

Object query languages augment standard text search with phrase queries 
that match mathematical operators and with wild cards that match arbitrary 
mathematical expressions. Abstractly, an object query engine is based on an in- 
dex, which is a set of pairs (Z, o) where o is an object and I is the location of o. The 
index is built from a collection of mathematical documents, and the result of an 
object query is a subset of the index. The object model is usually based on pre- 
sentation MathML and/or content MathML/OpenMath [W3C03,BCC+04], but 
importers can be used to index other formats such as LaTcX. Examples for ob- 
ject query languages and engines are given in [MY03,MM06,MG08,K§06,SL11]. 
A partial overview can be found in [SL11]. A central question is the use of wild 
cards. An example language with complex wild cards is given in [AY08]. Most 



generally, [K§06] uses unification queries that return all objects that can be 
unified with the query. 

Property query languages arc similar to object query languages except 
that both the index and the query use relational information that abstracts 
from the mathematical objects. For example, the relational index might store 
the toplcvel symbol of every object or the "used-in" relation between statements. 
This approximates an object index, and many property queries are special cases 
of object queries. But property queries are simpler and more efficient, and they 
still cover many important examples. Such languages are given in [GC03,AS04] 
and [BR03] based on the Coq and Mizar libraries, respectively. 

Compositional query languages focus on a complex language of query 
expressions that are evaluated compositionally. The atomic queries arc provided 
by the elements of the queried library. SQL [ANS03] uses n-ary relations be- 
tween elements, and query expressions use the algebra of relations. The SPARQL 
[W3C08] data model is RDF, and queries focus on unary and binary predicates 
on a set of URIs of statements. This could serve as the basis for mathematics 
on the semantic web. Both data models match bibliographical meta-data and 
property-based indices and could also be applied to the results of object queries 
(seen as sets of pairs); but they are not well-suited for expressions. The XQuery 
[W3C07] data model is XML, and query expressions are centered around oper- 
ations on lists of XML nodes. This is well-suited for XML-based markup lan- 
guages for mathematical documents and expressions and was applied to OMDoc 
[Koh06] in [ZK09]. In [KRZ10], the latter was combined with property queries. 
Very recently [ADL12] gave a compositional query language for hiproof proof 
trees that integrates concepts from both object and property queries. 

A number of individual libraries of mathematics provide custom query 
functionality. Ob ject query languages are used, for example, in [LM06] for Ac- 
tivemath or in Wolfram|Alpha. Most interactive proof assistants permit some 
object or property queries, primarily to search for theorems that are applicable 
to a certain goal, e.g., Isabcllc, Coq, and Matita. [Urb06] is notable for using 
automated reasoning to prepare an index of all Mizar theorems. 

It is often desirable to combine several of the above formalisms in the same 
query. Therefore, we have designed the query language QMT with the goal of 
permitting as many different query paradigms as possible. QMT uses a simple 
kernel syntax in which many advanced query paradigms can be defined. This per- 
mits giving a formal syntax, a formal semantics, and a scalable implementation, 
all of which are presented in this paper. 

QMT is grounded in the Mmt language (Module System for Mathematical 
Theories) [RK11], a scalable, modular representation language for mathematical 
knowledge. It is designed as a scalable trade-off between (i) a logical framework 
with formal syntax and semantics and (ii) an MKM framework that does not 
commit to any particular formal system. Thus, Mmt permits both adequate 
representations of virtually any formal system as well as the implementation of 
generic MKM services. We implement QMT on top of our Mmt system, which 
provides a flexible and scalable query API and query server. 
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Fig. 1. QMT Notions and their Intuitions 



Our design has two pivotal strengths. Firstly, QMT can be applied to the 
libraries of any formal system that is represented as Mmt. Queries can even span 
libraries of different systems. Secondly, QMT queries can make use of other Mmt 
services. For example, queries can access the inferred type and the presentation 
of a found expression, which are computed dynamically. 

We split the definition of QMT into two parts. Firstly, Sect. 2 defines QMT 
signatures in general and then the syntax and semantics of QMT for an arbitrary 
signature. Secondly, Sect. 3 describes a specific QMT signature that we use 
for Mmt libraries. Our implementation, which is based on that signature, is 
presented in Sect. 4. 

2 The QMT Query Language 
2.1 Syntax 

Our syntax arises by combining features of sorted first-order logic - which leads 
to very intuitive expressions - and of description logics - which leads to efficient 
evaluations. Therefore, our signatures S contain five kinds of declarations as 
given in Fig. 1. 

For a given signature, we define four kinds of expressions: types T, relations 
R, propositions F, and typed queries Q as listed in Fig. 1. The grammar for 
signatures and expressions is given in Fig. 2. 

The intuitions for most expression formation operators can be guessed easily 
from their notations. In the following we will discuss each in more detail. 

Regarding types T, we use product types and power type. However, we go 
out of our way to avoid arbitrary nestings of type constructors. Every type is 
cither a product t = a\ x . . . x a n of base types a, or the power type set(t) of 
such a type. Thus, we are able to use the two most important type formation 
operators in the context of querying: product types arise when a query contains 
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Fig. 2. The Grammar for Query Expressions 



multiple query variables, and power types arise when a query returns multiple 
results. But at the same time, the type system remains very simple and can be 
treated as essentially first-order. 

Regarding relations, we provide the common operations from the calculus 
of binary relations: dual/inverse R , transitive closure R*, composition R;R', 
union R\J R', intersection R n R', and difference R\ R' . Notably absent is the 
absolute complement operation i? c ; it is omitted because its semantics can not be 
computed efficiently in general. Note that the operation R^ 1 is only necessary 
for atomic R: For all other cases, we can put = (i? -1 )*, = 

fl'" 1 ;^- 1 , and = R' 1 * R'~ X for * £ {U,n,\}. 

Regarding propositions, we use the usual constructors of classical first-order 
logic: predicates, negation, conjunction, and universal quantification. As usual, 
the other constructors are definable. However, there is one specialty: The quan- 
tification Vx 6 Q.F{x) does not quantify over a type t; instead, it is relativized 
by a query result Q : set(t). This specialty is meant to support efficient evalua- 
tion: The extension of a base type is usually much larger than that of a query, 
and it may not be efficiently computable or not even finite. 

Regarding queries, our language combines intuitions from description and 
first-order logic with an intuitive mathematical notation. Constants c, variables 
x, and function application are as usual for sorted first-order logic. Q 1 * ... * Q n 
for n £ N and Qi for i = 1, . . . , n denote tupling and projection. R{Q) represents 
the image of the object given by Q under the relation given by R. {J x£ q Q'(%) 
denotes the union of the family of queries Q'{x) where x runs over all objects in 
the result of Q. Finally, {x 6 Q\F(x)} denotes comprehension on queries, i.e., 
the objects in Q that satisfy F. Just like for the universal quantification, all 
bound variables are relativized to a query result to support efficient evaluation. 

Remark 1. While we do not present a systematic analysis of the efficiency of 
QMT, we point out that we designed the syntax of QMT with the goal of sup- 
porting efficient evaluation. In particular, this motivated our distinction between 
the ontology part, i.e., concept and relation symbols, and the first-order part, 
i.e., the function and predicate symbols. 

Indeed, every concept c < t can be regarded as a function symbol c : set(t), 
and every relation r < a, a' as a predicate symbol r : a, a' — > prop. Thus, the 
ontology symbols may appear redundant — their purpose is to permit efficient 
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Fig. 3. Well-Formed Signatures 



evaluations. This is most apparent for relations. For a predicate symbol p : 
a, a' — > prop, evaluation requires a method that maps from [a] x [a'] to booleans. 
But for a relation symbol r < a, a' , evaluation requires a method that returns for 
any u all w such that (it, w) 6 [r] or all v such that (v, u) <S [r]. A corresponding 
property applies to concepts. 

Therefore, efficient implementations of QMT should maintain indices for 
them that are computed a priori: hash sets for the concept symbols and hash 
tables for the relation symbols. (Note that using hash tables for all relation sym- 
bols permits fast evaluation of all relation expressions i?, which is crucial for 
the evaluation of queries R(Q).) The implementation of function and predicate 
symbols, on the other hand, only requires plain functions that are called when 
evaluating a query. 

Thus, it is a design decision whether a certain feature is realized by an on- 
tology or by a first-order symbol. By separating the ontology and the first-order 
part, we permit simple indices for the former and retain flexible extensibility for 
the latter (see also Rem. 2). 



Based on these in- 
tuitions, it is straight- 
forward to define the 
well-formed expres- 
sions, i.e., the expres- 
sions that will have 
a denotational seman- 
tics. More formally, 
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r \-£ F : prop 


well-formed signature S 
well-formed type T 
well-typed query Q of type T 
well-typed query Q of type T 
well-typed relation R between a and a' 
well-formed proposition F 



we use the judg- Fig. 5. Judgments 

merits given in Fig. 5 

to define the well-formed expressions over a signature S and a context _T. The 
rules for these judgments are given in Fig. 3 and 4. 

In order to give some meaningful examples, we will already make use of the 
symbols from the MMT signature, which we will introduce in Sect. 3. 
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Fig. 4. Well-Formed Expressions 



Example 1. Consider a base type id : type of MMT identifiers in some fixed 
MMT library. Moreover, consider a concept symbol theory < id giving the iden- 
tifiers of all theories, and a relation symbol includes < id, id that gives the 
relation "theory A directly includes theory B" . 



Then the query theory of type set (id) yields the set of all theories. Given a 
theory u, the query includes* (u) of type set(id) yields the set of all theories 
that transitively include u. 

Example 2 (Continued) . Additionally, consider a concept constant < id of iden- 
tifiers of MMT constants, relation symbol declares < id, id that relates every 
theory to the constants declared in it, a base type obj : type of OpenMath 
objects, a function symbol type : id —> obj that maps each MMT constant to its 
type, and a predicate symbol occurs : id, obj — > prop that determines whether 
an identifier occurs in an object. 

Then the following query of type set(id) retrieves all constants that are 
included into the theory u and whose type uses the identifier v: 

{x G (includes* ; declares)(u) \ occurs(v, type(x))} 
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2.2 Semantics 

A Z'-modcl assigns to every symbol s in 
S a denotation. The formal definition is 
given in Def. 1. Relative to a fixed model 
M (which we suppress in the notation), 
each well-formed expression has a well- 
defined denotational semantics, given by 

the interpretation function [-]. The se- Fi S- 6 - Semantics of Judgments 
mantics of propositions and queries in context r is relative to an assignment a, 
which assigns values to all variables in r. An overview is given in Fig. 6. The 
formal definition is given in Dcf. 2. 

Definition 1 (Models). A S -model M assigns to every S -symbol s a denota- 
tion s M such that 

- a AI is a set for a : type 

- c M C fa] for c< a 

- r M C [a] x [a'] for r < a, a' 

- f M : [Til x ... x [T„] 



for f :T x ,...,T n ^rT 



- p M : [TiJ x ... x [T„] -»• {0, 1} for p : T u . . . ,T n 



prop 



Definition 2 (Semantics). Given a E '-model M, the interpretation function 
[— ] is defined as follows. 
Semantics of types: 

— \a\ x . . . x a n ] is the cartesian product atf x . . . x a„ 

— [set(i)J is the power set of [t] 

Semantics of relations: 

— [r] = r M 

— is the dual/inverse relation of \R\, i.e., the set {(u,v) \ (v,u) G 



- R* is the transitive closure of [iZ] 

- R; R' is the composition of JiZj and pZ'], 

i.e., the set {(u,w)\ exists v such that (u,v) £ [iZ], £ pZ'J} 

- RUR', RnR', and R\R' are interpreted in the obvious way using the union, 
intersection, and difference of sets 

Semantics of propositions under an assignment a: 

- MQ U . . . , Q n )j a = p M (lQ4 a , \QnT) 

- \->F\ a = 1 iff iFf = 

-{FA F'} a = 1 iff {Fj a = 1 and fF'j a = 1 

- \Vx E Q.F{x)l a = 1 iff {F{x)\ a - X / U = 1 for all u £ [Q] Q 

Semantics of queries P Q : T under an assignment a: 

- [cf = c M 

- = 

" [/(Qi, ■ • ■ , Qn)f = / M ([Qlf , ■ ■ ■ , I0n] a ) 

- lR{Q)\ a = {u g [a'] | ([<?r,u) e [iZ]} /or a rdafzon h s iZ < a, a' and a 
query r \-% Q : a 

informally, JiZ(Q)J Q is i/ie image of [QJ Q under [iZj 

- HUzeQ Q^ 2 -)!" * s ^ e union of all sets fQ'(x)1 a ' x ^ u where u runs over all 
elements of \Q\ a 

- \{x E ^^(x)}! is the subset of \Q\ a containing all elements u for which 
lF{x)\ a ' x ' u = 1 

Remark 2. It is easy to prove that if all concept and relation symbols are inter- 
preted as finite sets and if all function symbols with result type set{t) always 
return finite sets, then all well- formed queries of type set(t) denote a finite sub- 
set of [tj. Moreover, if the interpretations of the function and predicate symbols 
are computable functions, then the interpretation of queries is computable as 
well. This holds even if base types are interpreted as infinite sets. 
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Fig. 7. Predefined Symbols 



2.3 Predefined Symbols 

We use a number of predefined 
function and predicate symbols 
as given in Fig. 7. These are as- 
sumed to be implicitly declared 
in every signature, and their se- 
mantics is fixed. All of these symbols are overloaded for all simple types t. More- 
over, we use special notations for them. 

All of this is completely analogous to the usual treatment of equality as a 
predefined predicate symbol in first-order logic. The only difference is that our 
slightly richer type system calls for a few additional predefined symbols. 

It is easy to add further predefined symbols, in particular equality of sets 
(which, however, may be inefficient to decide) and binary union of queries. We 
omit these here for simplicity. 



2.4 Definable Queries 

Using the predefined symbols, we can define a number of further useful query 
formation operators: 

Example 3. Using the singleton symbol {_}, we can define for r \~s Q : set(t) 
and r,x:t h^; q(x) : t' 

{q(x) : x G Q} := (J {q(x)} of type set(t'). 

x<£Q 

It is easy to show that, semantically, this is the replacement operator, i.e., 
[{<z(a;) : x G Q}]] a is the set containing exactly the elements [^(a;)]"' 1 / 11 for any 
u e [Qf. 

Example 4 (SQL-style Queries). For a query \-£ Q : set(a\ x . . . x ojv), natural 
numbers n\, . . . , G {1, . . . , N}, and a proposition x\ : oi, . . . , a; at : ajv 
^(.Ti, . . . , x„) : prop, we write 

select ni, . . . , rik from Q where F(l, . . . , N) 

for the query 

{x ni * . . . * x nk : x G {y G Q | F(yi, . . . , yAr)}} 
of type set(a ni x . . . x a Ilfc ). 

Example 5 (XQuery-style Queries). For queries Q : set(a) and .t : a 
: a' and x : a,y : a' \~s Q"(x, y) : set(a"), and a proposition x : a,y : a' 
F(x,y) : prop, we write 

for x in Q let y = (/(.t) where F(x, y) return Q"{x, y) 

for the query 

\jQ"(zx,z 2 ) with P := {z G {x*q'(x) : x G Q} | F{z x ,z»)} 
of type set(a"). 

Example 6 (DL-style Queries). For a relation R < a, a' , a concept c < a, and 
a query h_j; Q : set(a'), we write \3 C R.Q for the query {a; G c | Vy G R(x).y G Q} 
of type set (a). 

Note that, contrary to the universal restriction OR.Q in description logic, 
we have to restrict the query to all x of concept c instead of querying for all x 
of type a. This makes sense in our setting because we assume that we can only 
iterate efficiently over concepts but not over (possibly infinite!) base types. 

However, this is not a loss of generality: individual signatures may always 
couple a base type a with a concept is a such that \is a \ = [a]- 
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Fig. 8. The QMT Signature for Mmt 



3 Querying MMT Libraries 



We will now fix an MMT-specific signature S that customizes QMT with the 
Mmt ontology as well as with several functions and predicates based on the 
Mmt specification. The declarations of £ are listed in Fig. 8. 

For simplicity, we avoid presenting any details of Mmt and refer to [RK11] 
for a comprehensive description. For our purposes, it is sufficient to know that 
Mmt organizes mathematical knowledge in a simple ontology that can be seen 
as the fragment of OMDoc pertaining to formal theories. We will explain the 
necessary details below when explaining the respective Z'-symbols. 

An Mmt library is any set of Mmt declarations (not necessarily well-typed 
or closed under dependency). Wc will assume a fixed library L in the following. 
Based on L, wc will define a model M by giving the interpretation s M for every 
symbol s listed in Fig. 8. 



Base Types We use three base types. Firstly, every Mmt declaration has a 
globally unique canonical identifier, its Mmt URL We use this to define id M 
as the set of all Mmt URIs declared in L. 

obj the set of all OpenMath objects that can be formed from the symbols 
in id M . In order to handle objects with free variables conveniently, we use the 
following convention: All objects in obj M are technically closed; but we permit 
the use of a special binder symbol free, which can be used to formally bind 
the free variables. This has the advantage that the context of an object, which 
may carry, e.g., type attributions, is made explicit. Using general OpenMath 
objects means that the type obj is subject to exactly a-equality and attribution 
flattening, the only equalities defined in the OpenMath standard. The much 
more difficult problem of queries relative to a stronger equality relation remains 
future work. 

The remaining base type xml is a generic container for any non-MMT XML 
data such as HTML or presentation MathML. Thus, xml M is the set of all XML 
elements. This is useful because the Mmt API contains several functions that 
return XML. 

Ontology For simplicity, we restrict attention to the most important notions of 
the Mmt ontology; adding the remaining notions is straightforward. The ontol- 
ogy only covers the Mmt declarations, all of which have canonical identifiers. 
Thus, all concepts refine the type id, and all relations are between identifiers. 

Among the Mmt concepts, theories are used to represent logics, theories of 
a logic, ontologies, type theories, etc. They contain constants, which represent 
function symbols, type operators, inference rules, theorems, etc. Constants may 
have OpenMath objects [BCC + 04] as their type or definiens. Theories are 
related via theory morphisms called views. These are truth-preserving transla- 
tions from one theory to another and represent translations and models. Theories 
and views together form a multi-graph of theories across which theorems can be 
shared. Finally, styles contain notations that govern the translation from con- 
tent to presentation markup. 

Mmt theories, views, and styles can be structured by a strong module sys- 
tem. The most important modular construct is the includes relation for explicit 
imports. The declares relation relates every theory to the constants it declares; 
this includes the constants that are not explicitly declared in L but induced 
by the module system. Finally, two further relations connect each view to its 
domain and codomain. 

All concepts and relations are interpreted in the obvious way. For example, 
the set theory M contains the Mmt URIs of all theories in L. 

Function and Predicate Symbols Regarding the function and predicate symbols, 
we are very flexible because a wide range of operations can be defined for Mmt 
libraries. In particular, every function implemented in the Mmt API can be 
easily exposed as a I7-symbol. Therefore, we only show a selection of symbols 
that showcase the potential. 



In Sect. 2, we have deliberately omitted partial function symbols in order 
to simplify the presentation of our language. However, in practice, it is often 
necessary to add them. For example, def M must be a partial function because 
(i) the argument might not be the Mmt URI of a constant declaration in L, 
or (ii) even if it is, that constant may be declared without a definiens. The 
best solution for an elegant treatment of partial functions is to use option types 
opt(t) akin to set types set(t). However, for simplicity, we make [— ] a partial 
function that is undefined whenever the interpretation of its argument runs into 
an undefined function application. This corresponds to the common concept of 
queries returning an error value. 

The partial functions type AI and def M take the identifier of a constant 
declaration and return its type or definiens, respectively. They are undefined for 
other identifiers. 

The partial function infer M (u,o) takes an object o and returns its dynam- 
ically inferred type. It is undefined if o is ill-typed. Since Mmt does not com- 
mit to a type system, the argument u must identify the type system (which 
is represented as an Mmt theory itself). If O is a binding object of the form 
0MBIND(0MS(f ree), r, d), the type of d is inferred in context T. 

arg p is a family of function symbols indexed by a natural number p. p in- 
dicates the position of a direct subobject (usually an argument), and arg AI (o) 
is the subobject of o at position p. In particular, argf 1 (0MA(/, a%, a n )) = 
a,;. Note that arbitrary subobjects can be retrieved by iterating arg . Simi- 
larly, subobj (o,h) is the set of all subobjects of o whose head is the sym- 
bol with identifier h. In particular, the head of 0MA(0MS(/i), a%, . . . , a n ) is h. In 
both cases, we keep track of the free variables, e.g., arg^ 1 (0MBIND(6, r, o)) = 
0MBIND(0MS(f ree), r, o) for b ^ OMS(free). 

unify M (O) performs an object query: It returns the set of all tuples u* o * s 
where u is the Mmt URI of a declaration in L that contains an object o that 
unifies with O using the substitution s. Here we use a purely syntactic definition 
for unifiability of OpenMath objects. 

render M (o, it) and render M (d,u) return the presentation markup dynami- 
cally computed by the Mmt rendering engine. This is useful because the query 
and the rendering engine are often implemented on the same remote server. 
Therefore, it is reasonable to compute the rendering of the query results, if de- 
sired, as part of the query evaluation. Moreover, larger signatures might provide 
additional functions to further operate on the presentation markup, render is 
overloaded because we can present both Mmt declarations and Mmt objects. 
In both cases, u is the Mmt URI of the style providing the notations for the 
rendering. 

The predicate symbol occurs takes an object O and an identifier u, and 
returns true if u occurs in O. 

Finally, we permit literals, i.e., arbitrary URIs and arbitrary OpenMath 
objects may be used as miliary constants, which are interpreted as themselves 
(or as undefined if they are not in the universe). This is somewhat inelegant 
but necessary in practice to form interesting queries. A more sophisticated QMT 



signature could use one function symbol for every OpenMath object constructor 
instead of using OpenMath literals. 

Example 7. An Mmt theory graph is the multigraph formed by using the theo- 
ries as nodes and all theory morphisms between them as edges. The components 
of the theory graph can be retrieved with a few simple queries. 

Firstly, the set of theories is retrieved simply using the query theory. Secondly, 
the theory morphisms are obtained by two different queries: 

views {v * x * y : v € view, x G domain{v), y G domain(v)} 
inclusions \J yetheQry {x * y : x G includes* (y)} 

The hrst one returns all view identifiers with their domain and codomain. Here 
we use an extension of the replacement operator {_ : _} from Ex. 3 to multiple 
variables. It is straightforward to define in terms of the unary one. The second 
query returns all pairs of theories between which there is an inclusion morphism. 

Example 8. Consider a constant identifier 3/ for the introduction rule of the 
existential quantifier from the natural deduction calculus. It produces a con- 
structive existence proof of 3x.P(x); it takes two arguments: a witness w, and a 
proof of P(w). Moreover, consider a theorem with identifier u. Recall that using 
the Curry-Howard representation of proofs-as-objects, a theorem u is a constant, 
whose type is the asserted formula and whose definiens is the proof. 

Then the following query retrieves all existential witnesses that come up in 
the proof of it: 

{arg 1 (x) : x G subobj(def(u),3I)} 
Here we have used the replacement operator introduced in Ex. 3. 

Example 9 (Continuing Ex. 8). Note that when using 3/, the proved formula P 
is present only implicitly as the type of the second argument of 3/. If the type 
system is given by, for example, LF and type inference for LF is available, we 
can extend the query from Ex. 8 as follows: 

{arg^x) * infer(LF, arg 2 (x)) : x G subobj (def (u) ,3/)} 

This will retrieve all pairs (w, P) of witnesses and proved formulas that come 
up in the proof of u. 

4 Implementation 

We have implemented QMT as a part of the Mmt API. The implementation 
includes a concrete XML syntax for queries and an integration with the Mmt 
web server, via which the query engine is exposed to users. The server can 
run as a background service on a local machine as well as a dedicated remote 
server. Sources, binaries, and documentation are available at the project web 
site [Rab08]. 



The Mmt API already implements the Mmt ontology so that appropriate 
indices for the semantics of all concept and relation symbols are available. Indices 
scale well because they are written to the hard drive and cached to memory 
on demand. With two exceptions, the semantics of all function and predicate 
symbols is implemented by standard Mmt API functions. 

The semantics of unify is computed differently: A substitution tree index of 
the queries library is maintained separately by an installation of MathWcbSearch 
[K§06]. Thus, QMT automatically inherits some heuristics of MathWcbSearch, 
such as unification up to symmetry of certain relation symbols. MathWebSearch 
and query engine run on the same machine and communicate via HTTP. 

Another subtlety is the semantics of infer. The Mmt API provides a plugin 
interface, through which individual type systems can be registered; the first 
argument to infer M is used to choose an applicable plugin. In particular, we 
provide a plugin for the logical framework LF [HHP93], which handles type 
inference for any type system that is formalized in LF; this covers all type systems 
defined in the LATIN library [CHK+11] and thus also applies to our imports of 
the Mizar [TB85] and TPTP libraries [SS98]. 

Query servers for individual libraries can be set up easily. In fact, because 
the Mmt API abstracts from different backends, queries automatically return 
results from all libraries that are registered with a particular instance of the 
Mmt API. This permits queries across libraries, which is particularly interesting 
if libraries share symbols. Shared symbols arise, for example, if both libraries use 
the standard OpcnMath CDs where possible or if overlap between the libraries' 
underlying meta-languagcs is explicated in an integrating framework like the 
LATIN atlas [CHK+11]. 

Example 10. The LATIN library [CHK+11] consists of over 1000 highly modu- 
larized LF signatures and views between them, formalizing a variety of logics, 
type theories, set theories, and related formal systems. Validating the library 
and producing the index for the Mmt ontology takes a few minutes with typical 
desktop hardware; reading the index into memory takes a few seconds. Typical 
queries as given in this paper are evaluated within seconds. 

As an extreme example, consider the query Q = Declares (theory). It returns 
in less than a second the about 2000 identifiers that are declared in any theory. 
The query UxsqI 3 "* type(x)} returns the same number of results but pairs every 
declaration with its type. This requires the query engine to read the types of all 
declarations (as opposed to only their identifiers) . If none of these are cached in 
memory yet, the evaluation takes about 4 minutes. 

5 Conclusion and Future Work 

We have introduced a simple, expressive query language for mathematical theo- 
ries (QMT) that combines features of compositional, property, and object query 
languages. QMT is implemented on top of the Mmt API; that provides any li- 
brary that is serialized as Mmt content markup with a scalable, versatile query- 
ing engine out of the box. As both Mmt and its implementation are designed to 



admit natural representations of any declarative language, QMT can be readily 
applied to many libraries including, e.g., those written in Twelf, Mizar, or TPTP. 

Our presentation focused on querying formal mathematical libraries. This 
matches our primary motivation but is neither a theoretical nor a practical re- 
striction. For example, it is straightforward to add a base type for presentation 
MathML and some functions for it. MathWcbSearch can be easily generalized 
to permit unification queries on presentation markup. This also permits queries 
that mix content and presentation markup, or content queries that find presen- 
tation results. Moreover, for presentation markup that is generated from content 
markup, it is easy to add a function that returns the corresponding content item 
so that queries can jump back and forth between them. 

Similarly, we can give a QMT signature with base types for authors and 
documents (papers, book chapters, etc.) as well as relations like author-of and 
cites. It is easy to generate the necessary indices from existing databases and to 
reuse our implementation for them. Moreover, with a relation mentions between 
papers and the type id of mathematical concepts, we can combine content and 
narrative aspects in queries. An index for the mentions relation is of course 
harder to obtain, which underscores the desirability of mathematical documents 
that are annotated with content URIs. 
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